Unsupervised Linguistically-Driven Reliable Dependency Parses Detection and Self-Training for Adaptation to the Biomedical Domain
نویسندگان
چکیده
In this paper, a new self–training method for domain adaptation is illustrated, where the selection of reliable parses is carried out by an unsupervised linguistically– driven algorithm, ULISSE. The method has been tested on biomedical texts with results showing a significant improvement with respect to considered baselines, which demonstrates its ability to capture both reliability of parses and domain– specificity of linguistic constructions.
منابع مشابه
ULISSE: an Unsupervised Algorithm for Detecting Reliable Dependency Parses
In this paper we present ULISSE, an unsupervised linguistically–driven algorithm to select reliable parses from the output of a dependency parser. Different experiments were devised to show that the algorithm is robust enough to deal with the output of different parsers and with different languages, as well as to be used across different domains. In all cases, ULISSE appears to outperform the b...
متن کاملDeep Unsupervised Domain Adaptation for Image Classification via Low Rank Representation Learning
Domain adaptation is a powerful technique given a wide amount of labeled data from similar attributes in different domains. In real-world applications, there is a huge number of data but almost more of them are unlabeled. It is effective in image classification where it is expensive and time-consuming to obtain adequate label data. We propose a novel method named DALRRL, which consists of deep ...
متن کاملLearning Reliability of Parses for Domain Adaptation of Dependency Parsing
The accuracy of parsing has exceeded 90% recently, but this is not high enough to use parsing results practically in natural language processing (NLP) applications such as paraphrase acquisition and relation extraction. We present a method for detecting reliable parses out of the outputs of a single dependency parser. This technique is also applied to domain adaptation of dependency parsing. Ou...
متن کاملTreeblazing: Using External Treebanks to Filter Parse Forests for Parse Selection and Treebanking
We describe “treeblazing”, a method of using annotations from the GENIA treebank to constrain a parse forest from an HPSG parser. Combining this with self-training, we show significant dependency score improvements in a task of adaptation to the biomedical domain, reducing error rate by 9% compared to out-of-domain gold data and 6% compared to self-training. We also demonstrate improvements in ...
متن کاملA Word Clustering Approach to Domain Adaptation: Effective Parsing of Biomedical Texts
We present a simple and effective way to perform out-of-domain statistical parsing by drastically reducing lexical data sparseness in a PCFG-LA architecture. We replace terminal symbols with unsupervised word clusters acquired from a large newspaper corpus augmented with biomedical targetdomain data. The resulting clusters are effective in bridging the lexical gap between source-domain and targ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013